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Abstract — Air pollution is an environmental issue 
studied worldwide, as it has serious impacts on human 
health. Therefore, forecasting its concentration is of great 
importance. Then, this study presents an analysis 
comprising the appliance of Unorganized Machines - 
Extreme Learning Machines (ELM) and Echo State 
Networks (ESN) aiming to predict particulate matter with 
aerodynamic diameter less than 2.5 jam (PM 2 . 5 ) and less 
than 10 jam (PM 10 ). The databases were from Kallio and 
Vallilla stations in Helsinki, Finland. The computational 
results showed that the ELM presented best results to 
PM 2 . 5 , while the ESN achieved the best performance to 
PM 10 . 

Keywords — air pollution , neural networks , particulate 
matter . 

I. INTRODUCTION 

Air pollution has always been an environmental issue 
worldwide. Therefore, its prediction comprises an 
important topic, mainly due to its impact on human 
health. Among urban air pollutants, particulate matter 
(PM) has been considered one of the most harmful ones, 
as it is related to hospital admissions for respiratory and 
cardiovascular problems, and even death [1], [2], and [3]. 
Then, PM concentration forecasting is of great interest to 
government plans and to warn population regarding 
events of severe pollution levels. 

Therefore, the present study aims to show a brief 
comparative analysis of artificial neural networks 
performance, the well-known Unorganized Machines - 
Extreme Learning Machines (ELM) and Echo State 
Network (ESN) - on predicting particulate matter with 
aerodynamic diameter less than 10 Dm (PM 10 ) and 
particulate matter with aerodynamic diameter less than 
2.5 Dm (PM 2 . 5 ). The database comprises Helsinki, 
Finland, air pollution from two distinct stations, Kallio 
and Vanilla from 2001 to 2003. 

II. PREDICTOR MODELS 

The Extreme Learning Machines (ELM) and the Echo 
State Networks (ESN) - collectively known as 


Unorganized Machines (UMs) - are artificial neural 
networks architectures characterized by a simple training 
process allied to good results [4]. The most important 
characteristic of these networks is its hidden layer stands 
untrained, allowing them to train only the output layer in 
a minimum mean square error sense, which confers a very 
fast adjust process to the networks [5]. 

The ELMs, proposed by [6], are feedforward networks, 
quite similar to the traditional Multilayer Perceptron 
(MLP) [7]. The authors proved by a constructive 
approach that the output error of a signal always can be 
reduced with the insertion of a new neuron in the hidden 
layer of a feedforward network. A condition have to be 
respected: the activation function of these neurons needs 
to be differentiable. By means of a rigorous mathematical 
demonstration, the authors proved that the structure have 
generalization capability and are universal approximators. 
Then, to predictions tasks, the ELM may present 
adequate results even to unknown input data, when it is 
trained. The most common way to adjust the 
weights of the output layer is the application of the 
Moore-Penrose pseudoinverse operation, which 
guarantees the best solution by means of a deterministic 
solution [5]. 

Unlike the ELM, the ESN are recursive networks 
endowed by feedback loops of information. It means 
some output responses are reinserted in the network input, 
generating an intrinsic memory. This characteristic may 
be good to solve problems in which the samples present 
temporal dependence. The ESNs were proposed by [8] 
and, as the ELM, are universal approximators with 
generalization capability [8] and [9]. 

The main difficulty in classic recurrent neural networks 
application is the training process, once it is necessary to 
apply nonlinear optimization techniques. This procedure 
may lead to instability, local convergence and, in general, 
it has high computational cost. As mentioned, in the ESN, 
the intermediate layer weights - called dynamic reservoir 
- remains unadjusted. The theoretical element that 
guarantees the presence of memory is the echo state 
propriety, which says the most recent historic of inputs 
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rules the internal dynamic of the reservoir. The immediate 
consequence is the weights of this layer may be defined 
previously. Then, the adjustment process can be limited to 
the output layer training, in the same way of the ELM 
case [8]. 


III. METHODOLOGY 

The data preprocessing is very important to the direct 
application of the neural networks. For this, it is applied 
the padronization described in Equation (1) [10]: 


CJ 


where, i =1, ..., is the index of each sample, d is the 
sample mean and <j is the standard deviation. The new 
series, z, are stationary with zero mean and standard 
deviation equals to one. The used lags were defined by 
preliminary tests and the lags selected are 2, 3, 4, 5, and 9, 
to PMioand, 1, 2, 4, 8, 9, and 10, to PM 2 . 5 . 


The performance metrics adopted were the Mean Square 
Error (MSE) and the Mean Absolute Percentage Error 
(MAPE), presented in Equations (2) and (3) [10]: 


MSE = ±f(d n -y n ) 2 , 

d n~y, 


mape = ~Y j 
N h 


( 2 ) 

( 3 ) 


where, N is the number of samples, y n the n-th data 
predicted and d n the desired response to the respective 
predicted data. 


IY. COMPUTATIONAL RESULTS 

The prediction of PM 10 and PM2.5 concentrations were 
held to Helsinki, Finland. To model and predict PM time 


series, the most common method comprises a weighted 
combination of the same variable observed data. The 
database consists of daily data from 2001 to 2003 to 
Kallio and Vallila Stations [11]. These stations are located 
in regions with distinct characteristics. Kallio is an urban 
background and Vallila is located in city downtown with 
influence of traffic. It means, population around Kallio 
station is less exposed to air pollution then Vallila ones, 
which are severely exposed to it. 

The computational results considered the forecasting with 
horizon of one-step ahead, and they are presented in 
Table 1, which shows the average of 30 simulations. The 
metrics used to evaluate the performance of the neural 
networks are MSE and MAPE to real domain (in the 
magnitude of the original data), and only MSE to 
padronized data. The label “NN” is the number of 
neurons in the intermediate layer of the neural networks 
that reached the best performance. The “K” label is 
related to Kallio station and “V” one is to Vallila station. 
The first observation is that there is no direct relation 
between performance and number of neurons (processor 
units). The ESN used always a few number of neurons 
then the ELM. Interestingly, to PM 10 forecasting, the 
ELM achieved the best performance, while to PM2.5, the 
opposite was verified, the ESN showing the best 
prediction. It may indicate which type of neural network - 
feedforward or recursive - is more suitable to solve the 
task. It is important to observe that the Friedman’s test 
[12] was performed to analyze the statistical significance 
of the results. The /7-values found were close to zero, 
which allows assuming the hypothesis that changes in the 
predictor lead to distinct results. 


Table. 1: Mean Square Error (MSE) and Mean Absolute Percentage Error (MAPE) of computational 
results to PM 10 and PM 2.5 using ELM and ESN to Kallio (K) and Vallila (V) stations 


Metrics 

ELM 

ESN 

ELM 

ESN 


NN 

120 

3 


10 

3 

0 

s 

MSE padron. 

0.0052 

0.0057 

in 

<N 

2 

0.0065 

0.0046 

Oh 

MSE 

33.1995 

36.5808 

Oh 

21.1057 

14.8165 


MAPE 

32.5936 

34.7290 

& 

42.7161 

37.8863 


NN 

70 

3 


70 

3 

0 

MSE padron. 

0.0029 

0.0034 

in 

<N 

0.0046 

0.0033 

S 

MSE 

55.0645 

64.0550 

<5 

Oh 

22.6697 

16.4340 

> 

MAPE 

33.6415 

43.3059 

> 

39.9475 

35.7573 


Figure 1 presents the forecasting results in comparison to 
the observed data to the best cases found. It shows the 
results to PM 10 from Kallio and Vallila stations and to 


PM2.5 from both stations, respectively. It is possible to 
observe that ELM and ESN fitted well to the observed 
data. 
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(b) Vallila and PM io 
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(c) Kallio and PM 2.5 


(c) Vallila and PM2.5 

Fig. 1: Best predictions achieved hy the 
unorganized machines in [ Ug/m 3 ] 

V. CONCLUSIONS 

The main reason to use ELM and ESN architectures is the 
good results in forecasting problems described in the 
literature, allied to a simple and efficient training process. 
The computational results showed that the ELM achieved 
the best performances to predict PM10 concentrations, 
while the ESN showed better results to PM2.5. These 
results could assist government with prediction tools that 
could help with mitigating measures. New researches can 
be done using regularized ELM and ESN with nonlinear 
output layers and, even applying variable selection 
techniques can increase the general performance of the 
models. 
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